Classification of Efficient Imputation Method for Analyzing Missing Values
نویسنده
چکیده
In Statistical analysis, missing data is a common problem for data quality. Many real datasets have missing data. Imputation preserves all cases by replacing missing data with a probable value based on other available information. Once all missing values have been imputed, the data set can be analyzed using standard techniques for complete data. This paper aim is to describe the efficient imputation method like Mean, Median, Refined Mean, Standard Deviation, Linear Regression, Discretization based method and some of clustering techniques like K-Mean and KNN methods which are used for imputing missing values in the dataset. The datasets are taken from the UCI ML repository. The results are compared in terms of accuracy. Keywords— Clustering Techniques, Discretization, K-Mean, KNN, Mean, Median, Refined Mean, Standard Deviation.
منابع مشابه
Performance evaluation of different estimation methods for missing rainfall data
There are numerous methods to estimate missing values of which some are used depending on the data type and regional climatic characteristics. In this research, part of the monthly precipitation data in Sarab synoptic station, east Azerbaijan province, Iran was randomly considered missing values. In order to study the effectiveness of various methods to estimate missing data, by seven classic s...
متن کاملA Novel Approach for Imputation of Missing Attribute Values for Efficient Mining of Medical Datasets - Class Based Cluster Approach
Missing attribute values are quite common in the datasets available in the literature. Missing values are also possible because all attributes values may not be recorded and hence unavailable due to several practical reasons. For all these one must fix missing attribute vales if the analysis has to be done. Imputation is the first step in analyzing medical datasets. Hence this has achieved sign...
متن کاملچند رویکرد برخورد با مقادیر گمشده متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی بالینی
Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...
متن کاملInfluence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons
Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...
متن کاملMissing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کامل